126 research outputs found

    Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell

    Get PDF
    Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes

    Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences

    Get PDF
    BACKGROUND: Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment. RESULTS: We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the beta-lactamase-like superfamily of metal-dependent hydrolases. CONCLUSIONS: In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts

    Small CRISPR RNAs guide antiviral defense in prokaryotes

    Get PDF
    Prokaryotes acquire virus resistance by integrating short fragments of viral nucleic acid into clusters of regularly interspaced short palindromic repeats (CRISPRs). Here we show how virus-derived sequences contained in CRISPRs are used by CRISPR-associated (Cas) proteins from the host to mediate an antiviral response that counteracts infection. After transcription of the CRISPR, a complex of Cas proteins termed Cascade cleaves a CRISPR RNA precursor in each repeat and retains the cleavage products containing the virus-derived sequence. Assisted by the helicase Cas3, these mature CRISPR RNAs then serve as small guide RNAs that enable Cascade to interfere with virus proliferation. Our results demonstrate that the formation of mature guide RNAs by the CRISPR RNA endonuclease subunit of Cascade is a mechanistic requirement for antiviral defense

    Optimal data partitioning, multispecies coalescent and Bayesian concordance analyses resolve early divergences of the grape family (Vitaceae)

    Get PDF
    Evolutionary rate heterogeneity and rapid radiations are common phenomena in organismal evolution and represent major challenges for reconstructing deep-level phylogenies. Here we detected substantial conflicts in and among data sets as well as uncertainty concerning relationships among lineages of Vitaceae from individual gene trees, supernetworks and tree certainty values. Congruent deep-level relationships of Vitaceae were retrieved by comprehensive comparisons of results from optimal partitioning analyses, multispecies coalescent approaches and the Bayesian concordance method. We found that partitioning schemes selected by PartitionFinder were preferred over those by gene or by codon position, and the unpartitioned model usually performed the worst. For a data set with conflicting signals, however, the unpartitioned model outperformed models that included more partitions, demonstrating some limitations to the effectiveness of concatenation for these data. For a transcriptome data set, fast coalescent methods (STAR and MP-EST) and a Bayesian concordance approach yielded congruent topologies with trees from the concatenated analyses and previous studies. Our results highlight that well-resolved gene trees are critical for the effectiveness of coalescent-based methods. Future efforts to improve the accuracy of phylogenomic analyses should emphasize the development of newmethods that can accommodate multiple biological processes and tolerate missing data while remaining computationally tractable. (C) The Willi Hennig Society 2017.National Natural Science Foundation of China [NNSF 31500179, 31590822, 31270268]; National Basic Research Program of China [2014CB954101]; National Science Foundation [DEB0743474]; Smithsonian Scholarly Studies Grant Program and the Endowment Grant Program; CAS/SAFEA International Partnership Program for Creative Research Teams; Laboratory of Analytical Biology of the National Museum of Natural History, Smithsonian Institution; Science and Technology Basic Work [2013FY112100]info:eu-repo/semantics/publishedVersio

    A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

    Full text link
    We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space, where N is the number of organisms, hh is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Preoteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database

    Reconciliation Revisited: Handling Multiple Optima when Reconciling with Duplication, Transfer, and Loss

    Get PDF
    Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication–loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn[superscript 2]) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.National Science Foundation (U.S.) (CAREER Award 0644282)National Institutes of Health (U.S.) (Grant RC2 HG005639)National Science Foundation (U.S.). Assembling the Tree of Life (Program) (Grant 0936234

    Panspermia, Past and Present: Astrophysical and Biophysical Conditions for the Dissemination of Life in Space

    Full text link
    Astronomically, there are viable mechanisms for distributing organic material throughout the Milky Way. Biologically, the destructive effects of ultraviolet light and cosmic rays means that the majority of organisms arrive broken and dead on a new world. The likelihood of conventional forms of panspermia must therefore be considered low. However, the information content of dam-aged biological molecules might serve to seed new life (necropanspermia).Comment: Accepted for publication in Space Science Review

    Evolution of regulatory signatures in primate cortical neurons at cell-type resolution

    Get PDF
    The human cerebral cortex contains many cell types that likely underwent independent functional changes during evolution. However, cell-type-specific regulatory landscapes in the cortex remain largely unexplored. Here we report epigenomic and transcriptomic analyses of the two main cortical neuronal subtypes, glutamatergic projection neurons and GABAergic interneurons, in human, chimpanzee, and rhesus macaque. Using genome-wide profiling of the H3K27ac histone modification, we identify neuron-subtype-specific regulatory elements that previously went undetected in bulk brain tissue samples. Human-specific regulatory changes are uncovered in multiple genes, including those associated with language, autism spectrum disorder, and drug addiction. We observe preferential evolutionary divergence in neuron subtype-specific regulatory elements and show that a substantial fraction of pan-neuronal regulatory elements undergoes subtype-specific evolutionary changes. This study sheds light on the interplay between regulatory evolution and cell-type-dependent gene-expression programs, and provides a resource for further exploration of human brain evolution and function

    Comparative genomic analysis reveals independent expansion of a lineage-specific gene family in vertebrates: The class II cytokine receptors and their ligands in mammals and fish

    Get PDF
    BACKGROUND: The high degree of sequence conservation between coding regions in fish and mammals can be exploited to identify genes in mammalian genomes by comparison with the sequence of similar genes in fish. Conversely, experimentally characterized mammalian genes may be used to annotate fish genomes. However, gene families that escape this principle include the rapidly diverging cytokines that regulate the immune system, and their receptors. A classic example is the class II helical cytokines (HCII) including type I, type II and lambda interferons, IL10 related cytokines (IL10, IL19, IL20, IL22, IL24 and IL26) and their receptors (HCRII). Despite the report of a near complete pufferfish (Takifugu rubripes) genome sequence, these genes remain undescribed in fish. RESULTS: We have used an original strategy based both on conserved amino acid sequence and gene structure to identify HCII and HCRII in the genome of another pufferfish, Tetraodon nigroviridis that is amenable to laboratory experiments. The 15 genes that were identified are highly divergent and include a single interferon molecule, three IL10 related cytokines and their potential receptors together with two Tissue Factor (TF). Some of these genes form tandem clusters on the Tetraodon genome. Their expression pattern was determined in different tissues. Most importantly, Tetraodon interferon was identified and we show that the recombinant protein can induce antiviral MX gene expression in Tetraodon primary kidney cells. Similar results were obtained in Zebrafish which has 7 MX genes. CONCLUSION: We propose a scheme for the evolution of HCII and their receptors during the radiation of bony vertebrates and suggest that the diversification that played an important role in the fine-tuning of the ancestral mechanism for host defense against infections probably followed different pathways in amniotes and fish
    • …
    corecore